Example: Hyperparameter tuning¶
This example shows an advanced example on how to optimize your model's hyperparameters for multi-metric runs.
Import the breast cancer dataset from sklearn.datasets. This is a small and easy to train dataset whose goal is to predict whether a patient has breast cancer or not.
Load the data¶
In [1]:
Copied!
# Import packages
from sklearn.datasets import load_breast_cancer
from optuna.distributions import IntDistribution
from atom import ATOMClassifier
# Import packages
from sklearn.datasets import load_breast_cancer
from optuna.distributions import IntDistribution
from atom import ATOMClassifier
In [2]:
Copied!
# Load the data
X, y = load_breast_cancer(return_X_y=True)
# Load the data
X, y = load_breast_cancer(return_X_y=True)
Run the pipeline¶
In [3]:
Copied!
# Initialize atom
atom = ATOMClassifier(X, y, n_jobs=4, verbose=2, random_state=1)
# Initialize atom
atom = ATOMClassifier(X, y, n_jobs=4, verbose=2, random_state=1)
<< ================== ATOM ================== >> Algorithm task: binary classification. Parallel processing with 4 cores. Dataset stats ==================== >> Shape: (569, 31) Memory: 138.96 kB Scaled: False Outlier values: 167 (1.2%) ------------------------------------- Train set size: 456 Test set size: 113 ------------------------------------- | | dataset | train | test | | - | ----------- | ----------- | ----------- | | 0 | 212 (1.0) | 170 (1.0) | 42 (1.0) | | 1 | 357 (1.7) | 286 (1.7) | 71 (1.7) |
In [4]:
Copied!
# Train a MultiLayerPerceptron model on two metrics
# using a custom number of hidden layers
atom.run(
models="MLP",
metric=["f1", "ap"],
n_trials=10,
est_params={"activation": "relu"},
ht_params={
"distributions": {
"hidden_layer_1": IntDistribution(2, 4),
"hidden_layer_2": IntDistribution(10, 20),
"hidden_layer_3": IntDistribution(10, 20),
"hidden_layer_4": IntDistribution(2, 4),
}
}
)
# Train a MultiLayerPerceptron model on two metrics
# using a custom number of hidden layers
atom.run(
models="MLP",
metric=["f1", "ap"],
n_trials=10,
est_params={"activation": "relu"},
ht_params={
"distributions": {
"hidden_layer_1": IntDistribution(2, 4),
"hidden_layer_2": IntDistribution(10, 20),
"hidden_layer_3": IntDistribution(10, 20),
"hidden_layer_4": IntDistribution(2, 4),
}
}
)
Training ========================= >> Models: MLP Metric: f1, average_precision Running hyperparameter tuning for MultiLayerPerceptron... | trial | hidden_layer_1 | hidden_layer_2 | hidden_layer_3 | hidden_layer_4 | f1 | best_f1 | average_precision | best_average_precision | time_trial | time_ht | state | | ----- | -------------- | -------------- | -------------- | -------------- | ------- | ------- | ----------------- | ---------------------- | ---------- | ------- | -------- | | 0 | 3 | 17 | 10 | 2 | 0.9455 | 0.9455 | 0.9837 | 0.9837 | 0.865s | 0.865s | COMPLETE | | 1 | 2 | 11 | 12 | 3 | 0.9739 | 0.9739 | 0.9988 | 0.9988 | 0.871s | 1.736s | COMPLETE | | 2 | 3 | 15 | 14 | 4 | 0.9913 | 0.9913 | 1.0 | 1.0 | 0.866s | 2.601s | COMPLETE | | 3 | 2 | 19 | 10 | 4 | 0.9649 | 0.9913 | 0.9867 | 1.0 | 0.875s | 3.476s | COMPLETE | | 4 | 3 | 16 | 11 | 2 | 0.9655 | 0.9913 | 0.998 | 1.0 | 0.848s | 4.324s | COMPLETE | | 5 | 4 | 20 | 13 | 4 | 0.9821 | 0.9913 | 0.9994 | 1.0 | 0.854s | 5.178s | COMPLETE | | 6 | 4 | 19 | 10 | 2 | 0.9825 | 0.9913 | 0.9901 | 1.0 | 0.866s | 6.043s | COMPLETE | | 7 | 2 | 19 | 11 | 3 | 0.7703 | 0.9913 | 0.9991 | 1.0 | 0.857s | 6.900s | COMPLETE | | 8 | 4 | 15 | 17 | 2 | 0.9913 | 0.9913 | 0.9997 | 1.0 | 0.899s | 7.799s | COMPLETE | | 9 | 4 | 19 | 10 | 4 | 0.9739 | 0.9913 | 0.9813 | 1.0 | 0.888s | 8.687s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 2 Best parameters: --> hidden_layer_sizes: (3, 15, 14, 4) Best evaluation --> f1: 0.9913 average_precision: 1.0 Time elapsed: 8.687s Fit --------------------------------------------- Train evaluation --> f1: 0.993 average_precision: 0.998 Test evaluation --> f1: 0.9861 average_precision: 0.995 Time elapsed: 1.244s ------------------------------------------------- Total time: 9.931s Final results ==================== >> Total time: 9.971s ------------------------------------- MultiLayerPerceptron --> f1: 0.9861 average_precision: 0.995
In [5]:
Copied!
# For multi-metric runs, the selected best trial is the first in the Pareto front
atom.mlp.best_trial
# For multi-metric runs, the selected best trial is the first in the Pareto front
atom.mlp.best_trial
Out[5]:
FrozenTrial(number=2, values=[0.9913043478260869, 1.0000000000000002], datetime_start=datetime.datetime(2022, 10, 2, 13, 25, 45, 590081), datetime_complete=datetime.datetime(2022, 10, 2, 13, 25, 46, 454867), params={'hidden_layer_1': 3, 'hidden_layer_2': 15, 'hidden_layer_3': 14, 'hidden_layer_4': 4}, distributions={'hidden_layer_1': IntDistribution(high=4, log=False, low=2, step=1), 'hidden_layer_2': IntDistribution(high=20, log=False, low=10, step=1), 'hidden_layer_3': IntDistribution(high=20, log=False, low=10, step=1), 'hidden_layer_4': IntDistribution(high=4, log=False, low=2, step=1)}, user_attrs={'params': {'hidden_layer_1': 3,
'hidden_layer_2': 15,
'hidden_layer_3': 14,
'hidden_layer_4': 4}, 'estimator': MLPClassifier(hidden_layer_sizes=(3, 15, 14, 4), random_state=1)}, system_attrs={'nsga2:generation': 0}, intermediate_values={}, trial_id=2, state=TrialState.COMPLETE, value=None)
In [6]:
Copied!
# Use the plot_pareto_front to select a better option
from optuna.visualization import plot_pareto_front
plot_pareto_front(atom.mlp.study)
# Use the plot_pareto_front to select a better option
from optuna.visualization import plot_pareto_front
plot_pareto_front(atom.mlp.study)
In [7]:
Copied!
# If you are unhappy with the results, it's possible to conitnue the study
atom.mlp.hyperparameter_tuning(n_trials=5)
# If you are unhappy with the results, it's possible to conitnue the study
atom.mlp.hyperparameter_tuning(n_trials=5)
Running hyperparameter tuning for MultiLayerPerceptron... | trial | hidden_layer_1 | hidden_layer_2 | hidden_layer_3 | hidden_layer_4 | f1 | best_f1 | average_precision | best_average_precision | time_trial | time_ht | state | | ----- | -------------- | -------------- | -------------- | -------------- | ------- | ------- | ----------------- | ---------------------- | ---------- | ------- | -------- | | 10 | 4 | 18 | 13 | 4 | 1.0 | 1.0 | 1.0 | 1.0 | 0.911s | 9.598s | COMPLETE | | 11 | 2 | 14 | 19 | 2 | 0.9492 | 1.0 | 0.9899 | 1.0 | 0.903s | 10.501s | COMPLETE | | 12 | 2 | 11 | 10 | 4 | 0.7703 | 1.0 | 0.99 | 1.0 | 1.164s | 11.665s | COMPLETE | | 13 | 2 | 12 | 15 | 2 | 0.9643 | 1.0 | 0.9813 | 1.0 | 0.852s | 12.516s | COMPLETE | | 14 | 3 | 11 | 16 | 4 | 0.7703 | 1.0 | 0.9724 | 1.0 | 0.863s | 13.379s | COMPLETE | Hyperparameter tuning --------------------------- Best trial --> 10 Best parameters: --> hidden_layer_sizes: (4, 18, 13, 4) Best evaluation --> f1: 1.0 average_precision: 1.0 Time elapsed: 13.379s
In [8]:
Copied!
# The trials attribute gives an overview of the trial results
atom.mlp.trials
# The trials attribute gives an overview of the trial results
atom.mlp.trials
Out[8]:
| params | estimator | score | time_trial | time_ht | state | |
|---|---|---|---|---|---|---|
| trial | ||||||
| 0 | {'hidden_layer_sizes': (3, 17, 10, 2)} | MLPClassifier(hidden_layer_sizes=(3, 17, 10, 2... | [0.9454545454545454, 0.9837236558914353] | 0.864786 | 0.864786 | COMPLETE |
| 1 | {'hidden_layer_sizes': (2, 11, 12, 3)} | MLPClassifier(hidden_layer_sizes=(2, 11, 12, 3... | [0.9739130434782608, 0.9988003322156944] | 0.870792 | 1.735578 | COMPLETE |
| 2 | {'hidden_layer_sizes': (3, 15, 14, 4)} | MLPClassifier(hidden_layer_sizes=(3, 15, 14, 4... | [0.9913043478260869, 1.0000000000000002] | 0.865787 | 2.601365 | COMPLETE |
| 3 | {'hidden_layer_sizes': (2, 19, 10, 4)} | MLPClassifier(hidden_layer_sizes=(2, 19, 10, 4... | [0.9649122807017544, 0.9867431480369178] | 0.874797 | 3.476162 | COMPLETE |
| 4 | {'hidden_layer_sizes': (3, 16, 11, 2)} | MLPClassifier(hidden_layer_sizes=(3, 16, 11, 2... | [0.9655172413793103, 0.9980213692125051] | 0.847772 | 4.323934 | COMPLETE |
| 5 | {'hidden_layer_sizes': (4, 20, 13, 4)} | MLPClassifier(hidden_layer_sizes=(4, 20, 13, 4... | [0.9821428571428572, 0.999389732649834] | 0.853777 | 5.177711 | COMPLETE |
| 6 | {'hidden_layer_sizes': (4, 19, 10, 2)} | MLPClassifier(hidden_layer_sizes=(4, 19, 10, 2... | [0.9824561403508771, 0.990093407959159] | 0.865788 | 6.043499 | COMPLETE |
| 7 | {'hidden_layer_sizes': (2, 19, 11, 3)} | MLPClassifier(hidden_layer_sizes=(2, 19, 11, 3... | [0.7702702702702703, 0.9990764494418141] | 0.85678 | 6.900279 | COMPLETE |
| 8 | {'hidden_layer_sizes': (4, 15, 17, 2)} | MLPClassifier(hidden_layer_sizes=(4, 15, 17, 2... | [0.9913043478260869, 0.9996975196612221] | 0.898818 | 7.799097 | COMPLETE |
| 9 | {'hidden_layer_sizes': (4, 19, 10, 4)} | MLPClassifier(hidden_layer_sizes=(4, 19, 10, 4... | [0.9739130434782608, 0.9813127743443262] | 0.887808 | 8.686905 | COMPLETE |
| 10 | {'hidden_layer_sizes': (4, 18, 13, 4)} | MLPClassifier(hidden_layer_sizes=(4, 18, 13, 4... | [1.0, 1.0000000000000002] | 0.91083 | 9.597735 | COMPLETE |
| 11 | {'hidden_layer_sizes': (2, 14, 19, 2)} | MLPClassifier(hidden_layer_sizes=(2, 14, 19, 2... | [0.9491525423728813, 0.9899476963066745] | 0.902822 | 10.500557 | COMPLETE |
| 12 | {'hidden_layer_sizes': (2, 11, 10, 4)} | MLPClassifier(hidden_layer_sizes=(2, 11, 10, 4... | [0.7702702702702703, 0.9900232191286547] | 1.16406 | 11.664617 | COMPLETE |
| 13 | {'hidden_layer_sizes': (2, 12, 15, 2)} | MLPClassifier(hidden_layer_sizes=(2, 12, 15, 2... | [0.9642857142857142, 0.9812621686248989] | 0.851775 | 12.516392 | COMPLETE |
| 14 | {'hidden_layer_sizes': (3, 11, 16, 4)} | MLPClassifier(hidden_layer_sizes=(3, 11, 16, 4... | [0.7702702702702703, 0.9723670235061694] | 0.862786 | 13.379178 | COMPLETE |
In [9]:
Copied!
# Select a custom best trial...
atom.mlp.best_trial = 2
# ...and check that the best parameters are now those in the selected trial
atom.mlp.best_params
# Select a custom best trial...
atom.mlp.best_trial = 2
# ...and check that the best parameters are now those in the selected trial
atom.mlp.best_params
Out[9]:
{'hidden_layer_sizes': (3, 15, 14, 4)}
In [10]:
Copied!
# Lastly, fit the model on the complete training set
# using the new combination of hyperparameters
atom.mlp.fit()
# Lastly, fit the model on the complete training set
# using the new combination of hyperparameters
atom.mlp.fit()
Fit --------------------------------------------- Train evaluation --> f1: 0.9948 average_precision: 0.9994 Test evaluation --> f1: 0.9861 average_precision: 0.997 Time elapsed: 2.507s
Analyze the results¶
In [11]:
Copied!
atom.plot_trials()
atom.plot_trials()